mRNA markers for survival prediction in glioblastoma multiforme patients: a systematic review with bioinformatic analyses

Background Glioblastoma multiforme (GBM) is a type of fast-growing brain glioma associated with a very poor prognosis. This study aims to identify key genes whose expression is associated with the overall survival (OS) in patients with GBM. Methods A systematic review was performed using PubMed, Scopus, Cochrane, and Web of Science up to Journey 2024. Two researchers independently extracted the data and assessed the study quality according to the New Castle Ottawa scale (NOS). The genes whose expression was found to be associated with survival were identified and considered in a subsequent bioinformatic study. The products of these genes were also analyzed considering protein-protein interaction (PPI) relationship analysis using STRING. Additionally, the most important genes associated with GBM patients’ survival were also identified using the Cytoscape 3.9.0 software. For final validation, GEPIA and CGGA (mRNAseq_325 and mRNAseq_693) databases were used to conduct OS analyses. Gene set enrichment analysis was performed with GO Biological Process 2023. Results From an initial search of 4104 articles, 255 studies were included from 24 countries. Studies described 613 unique genes whose mRNAs were significantly associated with OS in GBM patients, of which 107 were described in 2 or more studies. Based on the NOS, 131 studies were of high quality, while 124 were considered as low-quality studies. According to the PPI network, 31 key target genes were identified. Pathway analysis revealed five hub genes (IL6, NOTCH1, TGFB1, EGFR, and KDR). However, in the validation study, only, the FN1 gene was significant in three cohorts. Conclusion We successfully identified the most important 31 genes whose products may be considered as potential prognosis biomarkers as well as candidate target genes for innovative therapy of GBM tumors. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-024-12345-z.


Introduction
Glioblastoma multiforme (GBM) is one of the most malignant gliomas of the central nervous system (CNS) [1].The disappointing outcome of GBM treatment is a median survival of only 15 months despite multi-modalities of treatments [2,3].Based on the literature, GBM has special biological characteristics presenting high heterogeneity, diffusing invasiveness, and capacity to resist conventional therapies.In addition, the existence of biological barriers, e.g., BBB, makes this tumor difficult to treat [4].Hence, the development of new methods for the clinical treatment of GBM may be facilitated by identifying the key genes associated with GBM prognosis [5].
Over the last decade, an increased focus has been on clarifying the origin, genomic landscape, and gene expression profile of GBM by identifying specific molecular markers and pathways involved in this pathology [6].The advent of large-scale transcriptomic analyses in various cancers has tremendously increased our understanding of tumor biology and possible cancer therapy approaches [4].Accordingly, in recent years, an increasing number of studies have focused on gene expression patterns to propose biomarkers and GBM tumor treatment strategies [7].However, most of this information has not been translated into clinical practice for GBM patients [7].
The vast quantities of genomic data are now being deposited in public database repositories, such as Array Express (https://www.ebi.ac.uk/arrayexpress/),The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov), Chinese Glioma Genome Atlas (CGGA, http:// www.cgga.org.cn) and Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/).These genomic data are used by researchers around the world for the discovery of new genes of interest in GBM tumors.Several studies considered numerous mRNA expression datasets and identified gene signature panels to estimate prognosis in GBM tumors to improve the prognostic and predictive assessment of the tumors .However, there is no consensus in the literature on the top gene sets that could be eventually used in clinical practice.
Considering the current state of our knowledge, we sought that a systematic survey of the literature is urgently required to identify genes whose expression could be predictive of GBM survival.Subsequently, to determine the top genes whose expression could be of interest in clinical practice, we assess biological pathways and protein-protein interaction (PPI) networks associated with these genes via bioinformatic analyses.

Materials and methods
The Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guideline [263] was followed to conduct the review (Supplementary File 1, Table S1.).PubMed, Scopus, Cochrane, and Web of Science databases were used to search for relevant studies published between 24th February 2003 and 1st January 2024.The search was conducted by the terms "gene expression" or "expressed genes" or "mRNA" or "RNA-Seq" and "survival" or "prognostic" or "biomarker" and "Glioblastoma multiforme (GBM)" or "high-grade glioma".The full search strategy is reported in the supplementary File 1, Table S2.
Inclusion criteria were: (a) clinical study with human participants, (b) bioinformatics analysis study, (c) fulltext articles, (d) published in the English language, (e) published in peer-reviewed journals, and (f ) only genes related to GBM were considered.The exclusion criteria were as follows: (a) reviews, letters to the editor, and abstracts, (b) duplicate publications, (c) Plasma biomarker study, (d) participants with immunohistochemical (IHC) and Western blot analysis, (e) cell line study, (f ) studies that did not observe a significant correlation between mRNA expression and overall survival, (g) recurrent glioblastoma, (h) pseudogene, and (i) animal study and progression-free survival (PFS) were not considered.

Data extraction and quality assessment
Two independent authors (PA and TA) assessed and extracted all relevant articles.For each study, the following items were extracted: first author, publication year, country, mRNA, increased expression, decreased expression, public gene databases, detection method, short survival, long survival, and area under curve (AUC) for gene panel.The Newcastle-Ottawa Scale (NOS) was used to evaluate the quality of the eligible articles for case-control studies.NOS involves three perspectives: study group selection, group comparability, and whether the exposure or the outcome of interest for a case-control study is listed in the scale.Each study can obtain a maximum of nine stars [264].Studies scoring above the median NOS value were considered as high quality (low risk of bias) and those scoring below the median value were considered as low quality (high risk of bias).A summary of the method of quality evaluation is presented in Table 1.

Protein-protein interaction (PPI) network and signaling pathways analysis
All 613 genes (with p < 0.05) obtained from this review study were considered in the bioinformatic analysis.The PPI network was constructed by Cytoscape software (version 3.9.0;https://cytoscape.org/).The top important nodes of the PPI network were obtained based on the Cytohubba plug-in.The 5 well-known central indices, including degree, stress, betweenness, closeness, and radiality of nodes, were considered to rank the network nodes.The top 10% of genes were determined in each metric.Then, common genes were identified between five metrics.Finally, between common genes, proteins with a high degree of centrality were selected and were considered the most important ones to investigate their association with survival in GBM patients.Moreover, the top 10 genes ranked by degree are calculated.
A pathway analysis using the GO Biological Process (GOBP) 2023 database through the ENRICHR package (https://maayanlab.cloud/Enrichr/,accessed on 23 March 2024) was then performed for further specified related mechanisms involved in cancer such as cell proliferation, differentiation, apoptosis, mitosis, angiogenesis, and stemness.Only GOBP terms with adjusted p-value < 0.01 by ENRICHR analysis were used.

Survival analysis and validation of the gene expression in the GEPIA2 and CGGA datasets
To confirm the reliability of the identified gene from the PPI network, Kaplan-Meier curves were created according to the GEPIA2 (http://gepia2.cancer-pku.cn)and the CGGA (http://www.cgga.org.cn)databases.CGGA contained two glioma data sets, namely, mRNAseq_325 and mRNAseq_693.Primary GBM of CGGA (mRNA-seq_325) and CGGA (mRNAseq_693) data were considered.To determine differences in overall survival for patients with a low and high gene-expressing GBM, OS Kaplan-Meier analysis was performed by the GEPIA2 using the TCGA gene expression dataset and CGGA online applications.Kaplan-Meier curves were generated with a 50% median expression cutoff for high-and low-expressing groups.The estimation of hazard ratios was done by Cox proportional hazards model regression analysis.A 95% confidence interval was set and used.P < 0.05 was a statistically significant difference in validation cohorts from GEPIA2 and CGGA.
In the 720 genes studies, 613 unique genes were identified whose expression was associated with overall survival in GBM, of which 107 were described in two or more studies.See Supplementary Table 2 for details about the number of studies that described each gene, and whether or not it was found to be upregulated, downregulated, and the databases used.

NOS assessment
The risk of bias evaluation of the included studies for case-control studies according to the NOS is shown in Supplementary Tables 2 and Supplementary File 1, Table S4.Based on the NOS, the median score of the included studies was 7.Among the 255 studies, 131 studies that scored ≥ 7 were considered to present a low risk of bias.124 of the studies were considered with a high risk of bias since they scored b < 7.
The list of the top 31 genes is used as input for computing enrichment.As a result, 1271 GOBP terms were found and 11 GOBP terms were considered.The complete list of significantly enriched GOBP terms and related genes is given in Table 3.

Discussion
To the best of our knowledge, this systematic literature review is the most comprehensive review of gene expression for predicting GBM overall survival outcomes.The most 31 important genes including IL6, EGFR, STAT3, MMP9, CD44, FN1, CD4, TGFB1, CXCL8, CCL2, IL10, ICAM1, IL1A, CD274, KDR, SPP1, ITGB2, CDKN2A, PARP1, MYD88, AGT, NOTCH1, SERPINE1, TNFRSF1A, CDK1, CAV1, ITGB3, CDK4, FOXO3, MDM2, and PROM1, respectively, were considered as candidate biomarkers for GBM survival.Our analyses showed that in fact they all could be considered biomarkers.Nevertheless, based on the search strategy (Supplementary File 1, Table S2.), this review aimed to conduct a comprehensive, systematic literature review to identify all relevant studies that have significantly reported genes related to overall survival in GBM patients.However, some impact reports on this topic might have been missed due to limitations in the search strategy [265][266].In the study future, given the well-established heterogeneity of GBM, the assessment of the prognostic value of specific genes must be conducted with consideration for GBM molecular subtypes, to ensure a comprehensive understanding of their impact, and would pave the way for precision medicine [266].
Detection of a specific gene expression in GBM tumors may be used to diagnose the existence of a GBM disease or enable clinicians to select the most effective treatment.As there was heterogeneity among the studied genes, bioinformatic analyses were performed to compile these data.The results identified 31 key genes, which had high weight and good topological properties (degree, stress, betweenness, closeness, and radiality) in the pathogenic networks.In addition, these genes were validated by RT-qPCR assays or bioinformatic analysis of datasets.In this study according to 5 typical nodal metrics, we found the most 31 important genes related to the survival of patients with GBM.However, there is currently no consensus on how to use these metrics for the interpretation of biological networks [267].Therefore, these findings require further investigation.
Identification of survival-associated genes in GBMs has been ongoing over the past decade.However, the gene lists identified by researchers  differ considerably; only 107 common genes from 720 genes could be identified in studies.These differences can be attributed to two major factors.First, researchers have analyzed GBM datasets from various cohorts worldwide.Second, studies have analyzed different types of datasets, obtained using different approaches such as PCR or next-generation sequencing data.Due to technical limitations and cohorts' specificity, the expression profiles of similar genes identified from different datasets may be inconsistent.To improve prognostic and predictive survival power in GBM patients, researchers [69, 91-92, 94-95, 97, 109, 131, 137-138, 144, 147, 172, 186, 189, 195, 197, 200, 222, 242, 251, 253], identified a panel of 2, 3, 4, 6,7, 8, 13, or 14 genes using mRNA expression datasets.They established a risk score model that performed well in survival prediction.High-risk group patients had significantly poorer survival as compared with those in the low-risk group.In this study, AUC for the 1-year overall survival predictions was reported between o.587 [195], to 0.905 [69].This difference may be due to the use of different databases and cohorts' specificities.The obtained 31 mRNA panel in this study, is suggested to predict OS in glioblastoma in various cohorts.
The present study showed that ten hub genes (IL6, EGFR, STAT3, MMP9, CD44, FN1, CD4, TGFB1, CXCL8, CCL2) with higher node degree in PPI networks have been predicted to be survival biomarkers for GBM patients and some have been experimentally validated.These hub genes can be offered to the candidate biomarkers of future research for therapeutic targets in patients with GBM.In addition, this study showed that five hub genes (IL6, NOTCH1, TGFB1, EGFR, and KDR) were involved in most of the pathways, and they can be further investigated for biological discoveries.Moreover, we noticed that cell proliferation, apoptotic process, cell migration, and cell differentiation contain many of these genes (see Table 3).In addition, various studies showed that overexpression of these five genes leads to increased cell proliferation and invasion, and inhibition of apoptosis in glioblastoma tumors and was associated with poor patient survival [268][269][270][271]. Therefore, these GOBP terms may exert a synergistic effect on the survival of GBM, which could be clues to therapeutic strategies for this disease.
One might inquire about the hub genes obtained from this study.Certainly, various computer modeling algorithms and prediction methods have been and are being developed and used to predict outcomes in medical research.It is noted that each modeling approach has its strengths and weaknesses and there is no best one for all cases [272].The best modeling approach is uncertain, and may be obtained by combining more than one model, and research in this field continues [272].By changing the method of prediction, the most important variables will be changed to predict outcomes [272].We identified two hub gene groups that were associated with overall survival in GBM patients.The ten and five hub genes are ranked by degree and pathways analysis methods, respectively.It is noted that, when a different gene selection criterion is applied, the number of genes in the two top-ranking lists of the two methods will also change [273].In this study, the algorithms yielded different topranking gene lists due to their different approach.Interestingly, the two lists of hub genes have three in common, that were selected as the most important genes for the prediction of survival in methods and can be considered as three hub genes (IL6, TGFB1, and EGFR).
In this study, the five and ten lists of hub genes were selected based on two different methods, therefore, the two groups are different [272][273].On the other hand, as we all know, the TCGA and the CGGA databases are the world's largest and most comprehensive gene expression public databases in GBM patients.Hence, these databases were used for validation of our study results.In the validation analysis, we used the GEPIA2, the mRNA_ seq325, and the mRNA_seq693 of the CCGA.Only, the FN1 gene was significant in three cohorts.Although the mRNA_seq693 includes more patients with Grade 4 glioma compared to the mRNA_seq325, only three genes were significant compared to eleven genes observed in the other cohort of the CGGA (Table 3).The differences seen between the three databases can be due to the differences in genetics between the different populations.

Study consistency
Of the 255 manuscripts, all studies were prospective.No randomized trial was found.107 mRNAs (14.9%; 107/720) were common in all studies.However, a large number of studies have not been validated; hence, there was a lack of high-quality evidence in this study.124 studies were rated as fair quality; 131 studies were considered Fig. 3 The top 10 genes in the PPI network, in terms of degree ranking, were regarded as hub genes.The node color changes gradually from yellow to red in ascending order according to the degree of the genes Fig. 4 (a-n) Kaplan-Meier analysis of overall survival for GBM patients in the GEPIA2 using the TCGA cohort (a.FN1; b.CXCL8; and c.TNFRSF1A), the mRNA_seq325 of the CCGA (d.IL6; e. STAT3; f.MMP9; g.FN1; h.CD4; i. CCL2; j.IL10; k.ICAM1; l.KDR; m.MYD88; and n, MDM2), and the mRNA_seq693 of the CCGA (o, FN1, p, NOTCH1, q, CDKN2A) based on low-and high-expression of genes.The red line represents samples with high expression of the genes, and the blue line represents the samples with low expression of genes.Among 31 genes, p < 0.05 was considered to be statistically different to be of high quality.Types of studies and datasets were not consistently reported, resulting in a potential bias.
Among studies in this systematic review, genes with a significant correlation between gene expression and overall survival were considered.Previous studies have found that gene expression levels are associated with prognosis and some genes can be applied to predict the survival risk of GBM patients.However, some studies have conflicts regarding significant differences in gene expression and overall survival.These conflicts seem to depend on the GBM sample size, the heterogeneity of GBM, the datasets used, and the methodologies employed.All the abovementioned may explain why the validation step did not yield significant results.

Study quality
The small sample size in PCR-based studies, the high number of single-center studies, various databases, and the high number of studies from China (161/255), the USA (23/155), and India (10/255), which may affect the quality of studies.The heterogeneity of the studies reduced the quality of the data.In some of these datasets and GBM samples, the type and severity of GBM disease were not specified.In addition, some studies lacked validation of their candidate genes in a GBM patient cohort.Therefore, further research with large sample sizes and validation in GBM patients is warranted.

Strengths, limitations, and future perspectives
To the best of our knowledge, the current study is the first that systematically reviewed published data on gene expression related to the survival of GBM patients.Of note, the major strength of the current systematic review is that bioinformatic analyses were performed, which added new information to the previously studied gene expression on this topic.The findings reported here provide a better view of gene expression biomarkers in predicting the prognosis of patients with GBM.
There are some limitations in our work.Firstly, the search strategy was restricted to the English language literature only, hence, there is a possibility of excluding qualified studies published in other languages.Secondly, the study showed a high level of heterogeneity in the methods used among the included studies.In particular, there were heterogeneities in (1) Variety in disease severity; and (2) Age-and gender-related changes in GBM patients were not considered.Thirdly, the overall survival has been associated with multiple factors such as poor immune response, which was not considered in this study.Additionally, the role of gene expression was not completely clarified in various biological processes and the potential application of these molecules as gene therapies.Hence, future studies are required to clarify the biological roles of the mRNAs to investigate the possibility of their clinical utilization in GBM patients.

Conclusion
Our review suggests that the current evidence for gene expression associated with GBM survival is highly variable.At present, no clear decisions can be made from this systematic review for application into clinical practice.
The key recommendation from this study is that genetic data sharing develops strategies and guidelines in this field that can be used to answer important questions.Moreover, in future a combination of significant genes expression signatures can be applied to identify a powerful and independent predictor for outcome in GBM patients.

Fig. 1
Fig. 1 Flowchart of the selection process

Fig. 2
Fig. 2 The 613 differentially expressed genes were input into STRING database for PPI network analysis, and achieved a PPI network of 602 nodes and 5570 edges, with PPI enrichment p-value < 1.0 × 10-16.The network was constructed by Cytoscape based on the PPI correlations from the STRING database

Table 1
Check list for quality evaluation and scoring of studies based on NOS How representative was the bioinformatic analysis group in comparison with the validation group, and were assess by mRNA expression?(if yes, one star; no star if the patients were selected only in one group) 4. Use of bioinformatic database analysis and specimen verification as RT-qPCR to identify novel biomarkers predicting survival in GBM tumors.Are both of them used?(if yes, one star) Comparability Comparability of bioinformatic analysis dataset results with other datasets or methods of measurements as RT-qPCR basis of the design or analysis (if yes, two stars; one star was assigned if validation and verification was not reported clearly) Outcome assessment 6. Ascertainment of the outcome: clearly defined outcome of mRNA expression, survival analysis, and methods of measurements as RT-qPCR, (yes, two stars for information ascertained; one star if two of this information were not reported) 7. Appropriate statistical analysis: The statistical test used to analyze mRNA and the survival of GBM patients as bioinformatic analysis, RT-qPCR, or microarray were clearly defined and appropriate for bioinformatic analysis group or verification group (if yes, one star; no star was assigned if outcomes were not reported) mRNA, messenger RNA; RT-qPCR, Reverse transcription-quantitative polymerase chain reaction

Table 2
The most 31 important genes related to survival GBM

Table 3
The top enriched gene ontology biological process terms